{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "dbaa7c4b-20c8-4e04-a7cc-2c2fad8e6999",
   "metadata": {},
   "source": [
    "# QM-only Calculation \n",
    "## Objective\n",
    "\n",
    "The objective of this tutorial is to use QMzyme to create a truncated active site composed of residues within 5 Å of the bound ligand. All C-alpha atoms are set to be frozen during geometry optimization.\n",
    "\n",
    "This workflow allows you to:\n",
    "\n",
    "- Acquire basic understanding behind QM-only input file generation\n",
    "- Understand the foundation behind each \n",
    "\n",
    "In this specific example, we are using ketosteroid isomerase (KSI) as the model system. The structure for KSI is obtained from the PDB [1OH0](https://doi.org/10.2210/pdb1OH0/pdb) and MM-minimized prior to this tutorial. We will utilize it to create an input file for ORCA for geometry optimization and frequency calculation.\n",
    "\n",
    "## Classes used in this example\n",
    "\n",
    "- [GenerateModel](https://qmzyme.readthedocs.io/en/latest/API/QMzyme.GenerateModel.html)\n",
    "- [QM_Method](https://qmzyme.readthedocs.io/en/latest/API/QMzyme.CalculateModel.html#qm-treatment)\n",
    "- [CA_terminal TruncationScheme](https://qmzyme.readthedocs.io/en/latest/API/QMzyme.TruncationSchemes.html#QMzyme.TruncationSchemes.CA_terminal)\n",
    "- [DistanceCutoff SelectionScheme](https://qmzyme.readthedocs.io/en/latest/API/QMzyme.SelectionSchemes.html#QMzyme.SelectionSchemes.DistanceCutoff)\n",
    "\n",
    "## Required Files\n",
    "\n",
    "To start, you will need:\n",
    "\n",
    "- A fully prepped and protonated PDB.\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "656289a3-2d1c-49a9-aaed-69fcca90e2f5",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Here are the necesary imports for this tutorial!\n",
    "\n",
    "import QMzyme \n",
    "from QMzyme.data import PDB"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "869ac5af-96c8-4e7a-b505-d7f2d8f663d4",
   "metadata": {},
   "source": [
    "## Initialize Model and Define Regions\n",
    "\n",
    "To begin our example workflow, we need to start by initializing QMzymeModel using `GenerateModel()`.When loading the `PDB` file, we quickly run into an issue: a nonconventional residue is found. This simply means that there is a non-AMBER protein residue with an unknown charge. This can be resolved using `residue_charge.update()`, where we can manually update the charge of an unknown residue. Afterwards, `set_catalytic_center()` is used to designate a catalytic center in QMzymeModel. This is utilized in the SelectionScheme class to acquire the desired QM region."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "790aeb66-da33-4940-9401-b2bea7758ba2",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Charge information not present. QMzyme will try to guess region charges based on residue names consistent with AMBER naming conventions (i.e., aspartate: ASP --> Charge: -1, aspartic acid: ASH --> Charge: 0.). See QMzyme.data.residue_charges for the full set.\n",
      "\n",
      "\tNonconventional Residues Found\n",
      "\t------------------------------\n",
      "\tEQU --> Charge: UNK, defaulting to 0\n",
      "\n",
      "You can update charge information for nonconventional residues by running \n",
      "\t>>>QMzyme.data.residue_charges.update({'3LETTER_RESNAME':INTEGER_CHARGE}). \n",
      "Note your changes will not be stored after you exit your session. It is recommended to only alter the residue_charges dictionary. If you alter the protein_residues dictionary instead that could cause unintended bugs in other modules (TruncationSchemes).\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# We first initilize QMzymeModel by loading structure file.\n",
    "# The initial pdb file should be prepared prior (hydrogens must be present).\n",
    "model = QMzyme.GenerateModel(PDB)\n",
    "\n",
    "# This adds unknown residue charge.\n",
    "QMzyme.data.residue_charges.update({'EQU': -1})\n",
    "\n",
    "# This is used to set catalytic center.\n",
    "model.set_catalytic_center(selection='resid 263')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ac419209-0320-494b-ae0f-848077b05c86",
   "metadata": {},
   "source": [
    "For this workflow, we will utilize the DistanceCutoff class, which selects residues based on their distance from a user-defined catalytic center. Specifically, the workflow identifies atoms within a specified cutoff distance of the catalytic center and determines which residues should be included in the resulting region. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "aa4b3ec5-49d8-4592-afa9-6ab8bd7845e7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[<QMzymeRegion catalytic_center contains 37 atom(s) and 1 residue(s)>, <QMzymeRegion cutoff_5 contains 427 atom(s) and 33 residue(s)>]\n"
     ]
    }
   ],
   "source": [
    "# Import selection scheme 'DistanceCutoff' and use it to create 5 Å region around catalytic center.\n",
    "from QMzyme.SelectionSchemes import DistanceCutoff\n",
    "model.set_region(selection=DistanceCutoff, cutoff=5)\n",
    "print(model.regions)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8aea48de-fedd-4fdc-b0a0-bdf11f5651f5",
   "metadata": {},
   "source": [
    "## Designate Coordinate Constraints\n",
    "\n",
    "After acquiring the region, it is important to set a fixed atom. This is done by selecting a specific atom of choice and freezing them using `set_fixed_atoms()`. For our case, we will freeze C-alpha atoms for geometry optimization. To achieve, this, we will create a selection of atoms with the name CA, and freezing these atoms."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "8d511ea4-2fc5-44fb-88d7-3998b6385253",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Fixed atoms:  [<QMzymeAtom 235: CA of resname TYR, resid 16>, <QMzymeAtom 256: CA of resname ILE, resid 17>, <QMzymeAtom 309: CA of resname VAL, resid 20>, <QMzymeAtom 595: CA of resname ASP, resid 40>, <QMzymeAtom 615: CA of resname PRO, resid 41>, <QMzymeAtom 851: CA of resname TYR, resid 57>, <QMzymeAtom 896: CA of resname GLN, resid 59>, <QMzymeAtom 913: CA of resname GLY, resid 60>, <QMzymeAtom 920: CA of resname LEU, resid 61>, <QMzymeAtom 982: CA of resname VAL, resid 66>, <QMzymeAtom 1022: CA of resname ALA, resid 68>, <QMzymeAtom 1225: CA of resname MET, resid 84>, <QMzymeAtom 1256: CA of resname PHE, resid 86>, <QMzymeAtom 1300: CA of resname VAL, resid 88>, <QMzymeAtom 1331: CA of resname MET, resid 90>, <QMzymeAtom 1461: CA of resname LEU, resid 99>, <QMzymeAtom 1492: CA of resname VAL, resid 101>, <QMzymeAtom 1527: CA of resname ASH, resid 103>, <QMzymeAtom 1743: CA of resname MET, resid 116>, <QMzymeAtom 1777: CA of resname ALA, resid 118>, <QMzymeAtom 1808: CA of resname TRP, resid 120>, <QMzymeAtom 1888: CA of resname LEU, resid 125>]\n"
     ]
    }
   ],
   "source": [
    "# Selecting and freeznig CA atoms.\n",
    "c_alpha_atoms = model.cutoff_5.get_atoms(attribute='name', value='CA')\n",
    "model.cutoff_5.set_fixed_atoms(atoms=c_alpha_atoms)\n",
    "\n",
    "print(\"Fixed atoms: \",model.cutoff_5.get_atoms(attribute='is_fixed', value=True))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0452bc54-1da1-4587-a835-03e368cd7c8a",
   "metadata": {},
   "source": [
    "## Build the QM Method and Assign to Region\n",
    "\n",
    "Ultimately, QMzymeModel is a collection of QMzymeRegion and a QM method for calculation. This can be achieved by utilizing  `QM_Method()` and assigning QM method to the region. There are multiple attributes that can be given to the method, which will impact the input file! Some of the essential options are:\n",
    "\n",
    "- basis_set: defines the basis set to use for calculation.\n",
    "- functional: defines the functional to use for calculation.\n",
    "- qm_input: keywords to include in the input file route line to declare any details.\n",
    "- program: defines the program that will be used for QM calculation.\n",
    "\n",
    "You can find more information about QM_Method [here](https://qmzyme.readthedocs.io/en/latest/API/QMzyme.CalculateModel.html#qm-treatment).\n",
    "\n",
    "For this workflow, we'll show how to write for both ORCA and Gaussian. When looking at two methods, you can see that the only difference between two methods are program, where one is ORCA and one is Gaussian. This attribute is used to specify the input file format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "7ece80d6-f4de-4c2a-b882-9e974f0ed8a6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Use this for creating ORCA input file\n",
    "qm_method = QMzyme.QM_Method(\n",
    "    basis_set='6-31G*', \n",
    "    functional='wB97X-D3', \n",
    "    qm_input='OPT FREQ', \n",
    "    program='orca'\n",
    ")\n",
    "\n",
    "# Use this for creating Gaussian input file\n",
    "#qm_method = QMzyme.QM_Method(\n",
    "#    basis_set='6-31G*', \n",
    "#    functional='wB97XD', \n",
    "#    qm_input='OPT FREQ', \n",
    "#    program='gaussian'\n",
    "#)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0bafb520-8d44-43f1-915e-52e1d9d7f8db",
   "metadata": {},
   "source": [
    "After setting the method, we have to assign the method to the region. This can be done with `assign_to_region()` with your region of choice! When doing so, QMzyme will estimate the charge based on the commbination of charge information provided by the user with `residue_charge.update()` and standard AMBER residue charges. Alternatively, you can specify the charge and multiplicity of the system with `charge=` and `mult=`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "3891948c-6c80-434d-a94f-ac6ded648deb",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "QMzymeRegion cutoff_5 has an estimated charge of -2.\n"
     ]
    }
   ],
   "source": [
    "# Since we are not specifying the charge in this method below, the method\n",
    "# will estimate the charge based on residue naming conventions\n",
    "qm_method.assign_to_region(region=model.cutoff_5)\n",
    "\n",
    "# You would alternatively uncomment and run:\n",
    "#qm_method.assign_to_region(region=model.cutoff_5, charge=-2, mult=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a97f3e6c-b3e5-4d18-b2dd-6b62964b8ccb",
   "metadata": {},
   "source": [
    "## Truncate Model\n",
    "\n",
    "The last step before creating an input file is to use `model.truncate()`. This takes the full QMzymeModel (which, if you recall, is a region with a method), truncates it, and prepares it for input file generation. If not specified, `model.truncate()` will undergo `TerminalAlphaCarbon` truncation class, which will preserves peptide bonds between consecutive residues and remove terminal backbone amine and carboxylate groups. You can find more information about it [here](https://qmzyme.readthedocs.io/en/latest/API/QMzyme.TruncationSchemes.html#QMzyme.TruncationSchemes.TerminalAlphaCarbon)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "1ec497de-267b-40ce-8d30-1063e532cd4b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Truncated model has been created and saved to attribute 'truncated' and stored in QMzyme.CalculateModel.calculation under key QM. This model will be used to write the calculation input.\n",
      "Truncated model:  <QMzymeRegion cutoff_5_truncated contains 391 atom(s) and 33 residue(s)>\n",
      "Method details:  {'type': 'QM', 'qm_input': '6-31G* wB97X-D3 OPT FREQ', 'basis_set': '6-31G*', 'functional': 'wB97X-D3', 'qm_end': '', 'program': 'orca', 'freeze_atoms': [2, 23, 42, 58, 78, 84, 105, 122, 129, 148, 164, 174, 191, 211, 227, 244, 263, 279, 292, 309, 319, 343], 'mult': 1, 'charge': -2}\n"
     ]
    }
   ],
   "source": [
    "model.truncate()\n",
    "print(\"Truncated model: \", QMzyme.CalculateModel.calculation['QM'])\n",
    "print(\"Method details: \", QMzyme.CalculateModel.calculation['QM'].method)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4515b22d-ed66-40e0-957a-ddf79aad6460",
   "metadata": {},
   "source": [
    "## Write QM Input File\n",
    "\n",
    "The final step now is to create the calculation input file. This can be done by using `model.write_input()` method, which will create a directory named QCALC and save the input file there. When this method is called, the store_pickle() method is triggered, and you will have the corresponding .pkl file for your QMzymeModel. This is especially useful because after your calculation is completed, you can then load the calculation results back into the corresponding QMzymeRegion using the QMzymeRegion method store_calculation_results(), passing it the calculation file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2e602cab-691f-4e08-8172-2e1d365bef81",
   "metadata": {},
   "outputs": [],
   "source": [
    "model.write_input()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}